This research aims to determine if the postoperative function of a total knee replacement recipient can be modeled based on anthropomorphic data, and details of the surgical procedure. We decided to carry out testing on models with two postoperative scoring systems as the response variable. The first score is the “Surgeon’s Score” which measures more objective postoperative metrics about the condition of the replaced knee following surgery. These metrics include: how straight or how flexed the patient can make their knee, how well aligned the knee appears when inspecting from a frontal perspective, and how much laxity or separation can be achieved by placing the lower leg under tension. Finally, the patient is asked about the overall pain they experience with the knee. The resulting score is a value between 0-100 with 80-100 being an excellent result.
The second score we modeled is the “Patient’s Score” which is generated by three questions: how far are they able to walk, how well can they navigate stairs, and do they require any walking aids (crutches, cane, etc).
We hope to create a model that can predict each of the scores based on a variety of categorical and numeric predictors.
This data has been collected from a Southern California Hospital in collaboration with the Shiley Center for Orthopedic Research. The data is used for both clinical research and postoperative monitoring of implant survivorship. All patient specific information has been removed from the data so it adheres to the Health Insurance Portability and Accountability Act (HIPAA).
We formed our group by advertising on the course forum site and Slack chat. After the group had been formed, the first step was to decide on a topic and a data set for the selected topic. Next, after our proposal had been approved, we cleaned up the data and started working towards finding a final model that is most suitable for our data set. To reach the final model, each of us, worked independently to come up with a list of models. Models were presented and discussed during our regular meetings via Zoom where ideas were exchanged. Eventaully, we converged and decided on a final model that has been used for the work presented here.
We hope to determine the impact a particular surgeon or the type of implants used has on the patient’s outcome after surgery. We are also interested in exploring the potential relationship between anthropomorphic factors and the patient’s postoperative satisfaction. Finally, we would like to understand if the considerable expense of computer navigation in knee replacement leads to better postoperative scores. Nevertheless, we keep our options opened to investigate other aspects that were not ‘visible’ or thought about but might reveal themselves as we move on with the project.
Here we may consider KneeScore_Surgeon or KneeScore_Patient as response and Age, Gender, Weight, Height, BMI, Race, Side, Manufacturer as predictors. Gender, Race, Side and Manufacturer are some of the categorical predictors and Age, Height, Weight and BMI are numerical predictors.
knee = read.csv('knees.csv')
knee$Surgeon = as.factor(knee$Surgeon)
knee$Year = as.factor(knee$Year)
#Function to plot various plots as QQPlot, Fitted vs Residual Plot.
showPlots = function(model) {
par(mfcol=c(1,3))
qqnorm(resid(model), col = red, pch = 16)
qqline(resid(model), col = lightblue, lwd = 2)
plot(fitted(model), resid(model), xlab = 'Fitted Values', ylab = 'Residuals', main = 'Fitted vs. Residuals Plot', col = purple, pch = 16)
abline(h = 0, col = green, lwd = 2)
hist(resid(model), col = blue, main = 'Histogram of Residuals', xlab = 'Residuals')
}
#Function to calculate various statistics which we use to evaluate the model
evaluate = function(model){
# Run each of the tests
lambda = 1.000001 # this is a hack so the function will return a CrossValidation score - HatValues of 1.0 are causing a division by zero error.
shapiroTest = shapiro.test(resid(model))
bpTest = bptest(model)
rSquared = summary(model)$r.squared
adjRSquared = summary(model)$adj.r.squared
loocv = sqrt(mean((resid(model) / (lambda - hatvalues(model))) ^ 2))
large_hat = sum(hatvalues(model) > 2 * mean(hatvalues(model)))
large_resid = sum(rstandard(model)[abs(rstandard(model)) > 2]) / (length(rstandard(model) + lambda))
large_cooks = sum(cooks.distance(model) > 4 / length(cooks.distance(model)))
# Collect tests in dataframe
values = data.frame(Result = c(prettyNum(shapiroTest$p.value), prettyNum(bpTest$p.value), prettyNum(rSquared), prettyNum(adjRSquared),prettyNum(loocv), prettyNum(large_hat), prettyNum(large_cooks)))
rowNames = data.frame(Test = c('Shapiro Wilk pvalue', 'Breusch-Pagan pvalue', 'R Squared', 'Adj R Squared', 'Cross Validation', 'Large Hat Values', 'Influential'))
summary_output = cbind(rowNames, values)
show(summary_output)
showPlots(model)
}
# find and remove all influential values and return the resulting dataframe
removeInfluential = function(model, data){
cooks = which(cooks.distance(model) > 4 / length(cooks.distance(model)))
newData = data[-cooks , ]
return(newData)
}
# identify all influential data points
IdentifyInfluential = function(model, data){
cooks = which(cooks.distance(model) > 4 / length(cooks.distance(model)))
return(index_)
}
# find large hat values, subtract them from the dataframe and return the dataframe
removeLargeHatValues = function(model, data){
largeHat = which(hatvalues(model) > 2 * mean(hatvalues(model)))
newData = data[-largeHat, ]
return(newData)
}
# calculate RMSE
rmse = function(actual, predicted) { sqrt(mean((actual - predicted) ^ 2))}
Before moving onto the formal analysis of our data set, we started by performing visual inspection by plotting dependencies among our the variables. The main focus was to look for any visual relationship between KneeScore_Surgeon, KneeScore_Patient and the rest of the variables. Our choice was to use scatterplotMatrix from the car package as it plots regression lines along with the actual data.
scatterplotMatrix(~ KneeScore_Surgeon + KneeScore_Patient + Surgeon + Year + Age + Gender + Weight + Height, span=0.7, data=knee)
scatterplotMatrix(~ KneeScore_Surgeon + KneeScore_Patient + BMI + Diagnosis + Race + Side + GPS + Manufacturer, span=0.7, data=knee)
scatterplotMatrix(~ KneeScore_Surgeon + KneeScore_Patient + FemoralComponentModel + FemoralComponentSize + FemoralComponentType + TibialTrayModel + TibialInsertType, span=0.7, data=knee)
scatterplotMatrix(~ KneeScore_Surgeon + KneeScore_Patient + TibialTraySize + TibialInsertModel + TibialInsertWidth + PatellaModel + PatellaDiameter, span=0.7, data=knee)
Comments: Based on the above graphics, KneeScore_Surgeon, KneeScore_Patient seems to behave similarly across all variables. This is expected as both, surgeon and patient, should be in consensus in regards to the success or quality of a procedure.
For a better view of the above statement, we plotted the following plot, showing a strong relationship between KneeScore_Surgeon and KneeScore_Patient.
scatterplotMatrix(~ KneeScore_Surgeon + KneeScore_Patient, span=0.7, data=knee)
The distribution of KneeScore_Surgeon which reflects the score of a procedure assigned be the Surgeon is unimodal distribution versus a multimodal distribution of KneeScore_Patient, which is the score assigned by Patient. Our explanation of this is that Surgeon assigns any number between 1 and 100 without any discretion whereas Patient has a tendency to round up the score to the nearest fifth (Ex: 5, 10, 35, 50, etc) which creates the multi-picks of the distribution.
Begin by fitting a full additive model as a baseline and evaluating.
sample_idx = sample(1:nrow(knee), 1500)
train_data = knee[sample_idx, ]
test_data = knee[-sample_idx, ]
surgeon_full_additive_model = lm(KneeScore_Surgeon ~ ., data = train_data)
evaluate(surgeon_full_additive_model)
## Test Result
## 1 Shapiro Wilk pvalue 3.969607e-06
## 2 Breusch-Pagan pvalue 0.6677602
## 3 R Squared 0.2420614
## 4 Adj R Squared 0.1930753
## 5 Cross Validation 14.70631
## 6 Large Hat Values 166
## 7 Influential NA
showPlots(surgeon_full_additive_model)
The baseline model has many large hat values. Let’s try to understand why.
largeHat = which(hatvalues(surgeon_full_additive_model) > (2 * mean(hatvalues(surgeon_full_additive_model))))
print(knee[689, ])
## Surgeon Year Age Gender Weight Height BMI Diagnosis Race Side
## 689 4 2012 62 Female 200 63 35.46 Osteoarthritis White Right
## KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 689 42 30 No Stryker
## FemoralComponentModel FemoralComponentSize FemoralComponentType
## 689 Triathlon 4 cruciate retaining
## TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 689 Triathlon 4 Triathlon 11
## TibialInsertType PatellaModel PatellaDiameter
## 689 high flex Triathlon 32
After inspection of the rows that represent the large hat values it seems that there are a preponderance of values that are either errant entries or valid, but rare entries. For example, the Race predictor has only two entries for Mid-East Arabian. First we circle back on the original dataset and clean the identified errant entries.
print(knee[612, ])
## Surgeon Year Age Gender Weight Height BMI Diagnosis Race
## 612 3 2012 41 Female 182 66 29.4 Osteoarthritis Hispanic
## Side KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 612 Bilateral 39 80 Yes DePuy (J&J)
## FemoralComponentModel FemoralComponentSize FemoralComponentType
## 612 Sigma 4N cruciate retaining
## TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 612 PFC Sigma 3 PFC Sigma 8
## TibialInsertType PatellaModel PatellaDiameter
## 612 curved,Fixed bearing Oval Dome Patella 3-peg 35
print(knee[1496, ])
## Surgeon Year Age Gender Weight Height BMI Diagnosis Race
## 1496 4 2013 66 Female 158 64 27.14 Osteoarthritis White
## Side KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 1496 Right 74 50 No Smith & Nephew
## FemoralComponentModel FemoralComponentSize FemoralComponentType
## 1496 Legion 6N cruciate retaining
## TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 1496 Genesis II 5 Genesis II 9
## TibialInsertType PatellaModel PatellaDiameter
## 1496 high flex Genesis II 29
Refitting the model on the cleaned data only shows a marginal impact on the model so we move on to some other methods.
Next we will try to determine which predictors are significant by comparing the full model to models each with a single predictor removed. Evaluation of this code is set to false for convienience. Comments show the significance of each test.
surgeon_additive_model = lm(KneeScore_Surgeon ~ .-Surgeon, data = train_data)
Gender_additive_model = lm(KneeScore_Surgeon ~ .-Gender, data = train_data)
Race_additive_model = lm(KneeScore_Surgeon ~ .- Race, data = train_data)
Year_additive_model = lm(KneeScore_Surgeon ~ .- Year, data = train_data)
Age_additive_model = lm(KneeScore_Surgeon ~ .- Age, data = train_data)
Weight_additive_model = lm(KneeScore_Surgeon ~ .- Weight, data = train_data)
Height_additive_model = lm(KneeScore_Surgeon ~ .- Height, data = train_data)
BMI_additive_model = lm(KneeScore_Surgeon ~ .- BMI, data = train_data)
Diagnosis_additive_model = lm(KneeScore_Surgeon ~ .- Diagnosis, data = train_data)
Side_additive_model = lm(KneeScore_Surgeon ~ .- Side, data = train_data)
KneeScore_Patient_additive_model = lm(KneeScore_Surgeon ~ .- KneeScore_Patient, data = train_data)
GPS_additive_model = lm(KneeScore_Surgeon ~ .- GPS, data = train_data)
Manufacturer_additive_model = lm(KneeScore_Surgeon ~ .- Manufacturer, data = train_data)
FemoralComponentModel_additive_model = lm(KneeScore_Surgeon ~ .- FemoralComponentModel, data = train_data)
FemoralComponentSize_additive_model = lm(KneeScore_Surgeon ~ .- FemoralComponentSize, data = train_data)
FemoralComponentType_additive_model = lm(KneeScore_Surgeon ~ .- FemoralComponentType, data = train_data)
TibialTrayModel_additive_model = lm(KneeScore_Surgeon ~ .- TibialTrayModel, data = train_data)
TibialTraySize_additive_model = lm(KneeScore_Surgeon ~ .- TibialTraySize, data = train_data)
TibialInsertModel_additive_model = lm(KneeScore_Surgeon ~ .- TibialInsertModel, data = train_data)
TibialInsertWidth_additive_model = lm(KneeScore_Surgeon ~ .- TibialInsertWidth, data = train_data)
TibialInsertType_additive_model = lm(KneeScore_Surgeon ~ .- TibialInsertType, data = train_data)
PatellaModel_additive_model = lm(KneeScore_Surgeon ~ .- PatellaModel, data = train_data)
PatellaDiameter_additive_model = lm(KneeScore_Surgeon ~ .- PatellaDiameter, data = train_data)
anova(surgeon_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.2061
anova(Gender_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.3678
anova(Race_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.3949
anova(Year_additive_model,surgeon_full_additive_model) ## * Some significance Pr(<F) 0.039
anova(Age_additive_model,surgeon_full_additive_model) ## *** Significant Pr(<F) 1.191e-11
anova(Weight_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.4295
anova(Height_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.6232
anova(BMI_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.4046
anova(Diagnosis_additive_model,surgeon_full_additive_model) ## Marginally Significant Pr(<F) 0.067
anova(Side_additive_model,surgeon_full_additive_model) ## *** Significant Pr(<F) 2.2e-16
anova(KneeScore_Patient_additive_model,surgeon_full_additive_model) ## *** Significant Pr(<F) 2.2e-16
anova(GPS_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.4046
anova(Manufacturer_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.7153
anova(FemoralComponentModel_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.1388
anova(FemoralComponentSize_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.3345
anova(FemoralComponentType_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.5606
anova(TibialTrayModel_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.8707
anova(TibialTraySize_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.1527
anova(TibialInsertModel_additive_model,surgeon_full_additive_model) ## * Some significance Pr(<F) 0.043
anova(TibialInsertWidth_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.721
anova(TibialInsertType_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.613
Since there are only a few predictors selected as significant, we can create a smaller model based on those.
surgeon_selective_model = lm(KneeScore_Surgeon ~ Age + Year + Diagnosis + Side + KneeScore_Patient + TibialInsertModel, data = train_data)
evaluate(surgeon_selective_model)
## Test Result
## 1 Shapiro Wilk pvalue 3.789549e-06
## 2 Breusch-Pagan pvalue 0.002675905
## 3 R Squared 0.2110465
## 4 Adj R Squared 0.1954821
## 5 Cross Validation 14.38481
## 6 Large Hat Values 156
## 7 Influential 75
anova(surgeon_selective_model, surgeon_full_additive_model)
## Analysis of Variance Table
##
## Model 1: KneeScore_Surgeon ~ Age + Year + Diagnosis + Side + KneeScore_Patient +
## TibialInsertModel
## Model 2: KneeScore_Surgeon ~ Surgeon + Year + Age + Gender + Weight +
## Height + BMI + Diagnosis + Race + Side + KneeScore_Patient +
## GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize +
## FemoralComponentType + TibialTrayModel + TibialTraySize +
## TibialInsertModel + TibialInsertWidth + TibialInsertType +
## PatellaModel + PatellaDiameter
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1470 297676
## 2 1408 285974 62 11702 0.9293 0.6326
The evaluation doesn’t show much improvement and anova rejects the significance of the smaller model.
We will attempt to utilize AIC and BIC approaches and evaluate the results.
surgeon_model_back_aic = step(surgeon_full_additive_model, direction = "backward", trace = 0)
evaluate(surgeon_model_back_aic)
## Test Result
## 1 Shapiro Wilk pvalue 5.744359e-05
## 2 Breusch-Pagan pvalue 0.003167732
## 3 R Squared 0.207701
## 4 Adj R Squared 0.1958997
## 5 Cross Validation 14.33432
## 6 Large Hat Values 85
## 7 Influential 69
anova(surgeon_model_back_aic, surgeon_full_additive_model)
## Analysis of Variance Table
##
## Model 1: KneeScore_Surgeon ~ Year + Age + Weight + BMI + Side + KneeScore_Patient +
## FemoralComponentModel + PatellaDiameter
## Model 2: KneeScore_Surgeon ~ Surgeon + Year + Age + Gender + Weight +
## Height + BMI + Diagnosis + Race + Side + KneeScore_Patient +
## GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize +
## FemoralComponentType + TibialTrayModel + TibialTraySize +
## TibialInsertModel + TibialInsertWidth + TibialInsertType +
## PatellaModel + PatellaDiameter
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1477 298938
## 2 1408 285974 69 12964 0.9251 0.6511
Removing the Influential observations and running AIC again-
clean_dataset_aic = removeInfluential(surgeon_model_back_aic, train_data)
surgeon_full_additive_model_clean = lm(KneeScore_Surgeon ~ ., data = clean_dataset_aic)
surgeon_model_aic_clean = step(surgeon_full_additive_model_clean, direction = "backward", trace = 0)
evaluate(surgeon_model_aic_clean)
## Test Result
## 1 Shapiro Wilk pvalue 0.002316926
## 2 Breusch-Pagan pvalue 0.0005649064
## 3 R Squared 0.2316975
## 4 Adj R Squared 0.2246488
## 5 Cross Validation 13.05352
## 6 Large Hat Values 28
## 7 Influential 59
anova(surgeon_model_aic_clean, surgeon_full_additive_model_clean)
## Analysis of Variance Table
##
## Model 1: KneeScore_Surgeon ~ Year + Age + Height + BMI + Side + KneeScore_Patient +
## TibialTraySize
## Model 2: KneeScore_Surgeon ~ Surgeon + Year + Age + Gender + Weight +
## Height + BMI + Diagnosis + Race + Side + KneeScore_Patient +
## GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize +
## FemoralComponentType + TibialTrayModel + TibialTraySize +
## TibialInsertModel + TibialInsertWidth + TibialInsertType +
## PatellaModel + PatellaDiameter
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1417 239496
## 2 1341 228175 76 11321 0.8755 0.7678
Utilize a Bayesian Information Criterion approach and evaluate the selected model.
n = length(resid(surgeon_full_additive_model))
surgeon_model_back_bic = step(surgeon_full_additive_model, direction = "backward", k = log(n), trace = 0)
evaluate(surgeon_model_back_bic)
## Test Result
## 1 Shapiro Wilk pvalue 3.533832e-05
## 2 Breusch-Pagan pvalue 0.02174066
## 3 R Squared 0.1779156
## 4 Adj R Squared 0.175716
## 5 Cross Validation 14.4313
## 6 Large Hat Values 68
## 7 Influential 92
coef(surgeon_model_back_bic)
## (Intercept) Age SideLeft SideRight
## 12.3236748 0.2819257 8.7281485 8.5845741
## KneeScore_Patient
## 0.2521117
anova(surgeon_model_back_bic, surgeon_full_additive_model)
## Analysis of Variance Table
##
## Model 1: KneeScore_Surgeon ~ Age + Side + KneeScore_Patient
## Model 2: KneeScore_Surgeon ~ Surgeon + Year + Age + Gender + Weight +
## Height + BMI + Diagnosis + Race + Side + KneeScore_Patient +
## GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize +
## FemoralComponentType + TibialTrayModel + TibialTraySize +
## TibialInsertModel + TibialInsertWidth + TibialInsertType +
## PatellaModel + PatellaDiameter
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1495 310177
## 2 1408 285974 87 24202 1.3697 0.01561 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The resulting BIC model scores marginally better in Cross Validation though the anova results clearly prefer the full additive model.
Lets try the interaction between the predictors which we got in the model selected by Backward BIC.
surgeon_model_back_bic_int = lm(KneeScore_Surgeon ~ Age * Side * KneeScore_Patient, data = train_data)
evaluate(surgeon_model_back_bic_int)
## Test Result
## 1 Shapiro Wilk pvalue 5.860884e-06
## 2 Breusch-Pagan pvalue 0.001950516
## 3 R Squared 0.2038934
## 4 Adj R Squared 0.1980082
## 5 Cross Validation 14.29154
## 6 Large Hat Values 138
## 7 Influential 94
anova(surgeon_model_back_bic_int, surgeon_full_additive_model)
## Analysis of Variance Table
##
## Model 1: KneeScore_Surgeon ~ Age * Side * KneeScore_Patient
## Model 2: KneeScore_Surgeon ~ Surgeon + Year + Age + Gender + Weight +
## Height + BMI + Diagnosis + Race + Side + KneeScore_Patient +
## GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize +
## FemoralComponentType + TibialTrayModel + TibialTraySize +
## TibialInsertModel + TibialInsertWidth + TibialInsertType +
## PatellaModel + PatellaDiameter
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1488 300375
## 2 1408 285974 80 14401 0.8863 0.7519
coef(surgeon_model_back_bic_int)
## (Intercept) Age
## -16.82893122 0.85626307
## SideLeft SideRight
## 35.34068923 23.94000851
## KneeScore_Patient Age:SideLeft
## 0.98984321 -0.58092659
## Age:SideRight Age:KneeScore_Patient
## -0.43222169 -0.01384533
## SideLeft:KneeScore_Patient SideRight:KneeScore_Patient
## -0.67082908 -0.61485672
## Age:SideLeft:KneeScore_Patient Age:SideRight:KneeScore_Patient
## 0.01374840 0.01316275
Perhaps removing some influential data points will assist in creating a better model. Next we remove influential points that satisfy the criteria for Large Cook's Distance values and evaluating BIC again.
clean_dataset = removeInfluential(surgeon_model_back_bic, train_data)
surgeon_full_additive_model_clean = lm(KneeScore_Surgeon ~ ., data = clean_dataset)
surgeon_model_back_bic_clean = step(surgeon_full_additive_model_clean, direction = "backward", k = log(n), trace = 0)
evaluate(surgeon_model_back_bic_clean)
## Test Result
## 1 Shapiro Wilk pvalue 0.0003205851
## 2 Breusch-Pagan pvalue 0.0001228871
## 3 R Squared 0.2288343
## 4 Adj R Squared 0.2266357
## 5 Cross Validation 12.4029
## 6 Large Hat Values 73
## 7 Influential 53
anova(surgeon_model_back_bic_clean, surgeon_full_additive_model_clean)
## Analysis of Variance Table
##
## Model 1: KneeScore_Surgeon ~ Age + Side + KneeScore_Patient
## Model 2: KneeScore_Surgeon ~ Surgeon + Year + Age + Gender + Weight +
## Height + BMI + Diagnosis + Race + Side + KneeScore_Patient +
## GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize +
## FemoralComponentType + TibialTrayModel + TibialTraySize +
## TibialInsertModel + TibialInsertWidth + TibialInsertType +
## PatellaModel + PatellaDiameter
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1403 215164
## 2 1316 198050 87 17114 1.3071 0.03408 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
How about interaction on clean dataset-
surgeon_model_back_bic_int_clean = lm(KneeScore_Surgeon ~ Age * Side * KneeScore_Patient, data = clean_dataset)
evaluate(surgeon_model_back_bic_int_clean)
## Test Result
## 1 Shapiro Wilk pvalue 0.000194294
## 2 Breusch-Pagan pvalue 0.002528189
## 3 R Squared 0.2459378
## 4 Adj R Squared 0.239996
## 5 Cross Validation 12.30977
## 6 Large Hat Values 131
## 7 Influential 44
comparing BIC and Interaction Models
anova(surgeon_model_back_bic_clean, surgeon_model_back_bic_int_clean)
## Analysis of Variance Table
##
## Model 1: KneeScore_Surgeon ~ Age + Side + KneeScore_Patient
## Model 2: KneeScore_Surgeon ~ Age * Side * KneeScore_Patient
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1403 215164
## 2 1396 210392 7 4772.1 4.5234 5.321e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Based on Anova test we can reject the surgeon_model_back_bic_clean model and prefer surgeon_model_back_bic_int_clean.
Next, we will evaluate the RMSE of the best performing models thus far.
| Model | Train RMSE | Test RMSE |
|---|---|---|
Model All+ |
13.8075825 | 14.1531654 |
Selective Model |
14.0872534 | 14.0449973 |
Model AIC |
14.1170897 | 14.5745219 |
Model BIC |
14.3799985 | 14.1825336 |
Model Int |
14.1509713 | 14.131628 |
Model Clean All+ |
14.0419197 | 14.2632453 |
Model Clean AIC |
14.3040352 | 14.1389152 |
Model Clean BIC |
14.3884783 | 14.1955392 |
Model Clean Int |
14.2282859 | 14.1009845 |
Interaction model gives the BEST RMSE also low Cross Validation and High \(Adjusted R^2\), so we will consider this as the best model to predict the KneeScore_Surgeon.
Begin by fitting a full additive model as a baseline and evaluating.
pat_full_additive_model = lm(KneeScore_Patient ~ ., data = knee)
evaluate(pat_full_additive_model)
## Test Result
## 1 Shapiro Wilk pvalue 1.177599e-13
## 2 Breusch-Pagan pvalue 0.257962
## 3 R Squared 0.296855
## 4 Adj R Squared 0.261377
## 5 Cross Validation 17.00694
## 6 Large Hat Values 204
## 7 Influential NA
The baseline model has many large leverage values, which, upon inspection may be coming from several of the implant fields. Perhaps removing the suspect implant predictors will help improve the model.
pat_model_eight = lm(KneeScore_Patient ~ . - Diagnosis - Race - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - TibialTraySize - FemoralComponentType, data = knee)
evaluate(pat_model_eight)
## Test Result
## 1 Shapiro Wilk pvalue 2.572268e-14
## 2 Breusch-Pagan pvalue 0.0001090169
## 3 R Squared 0.2776021
## 4 Adj R Squared 0.2567126
## 5 Cross Validation 16.92618
## 6 Large Hat Values 125
## 7 Influential NA
anova(pat_model_eight, pat_full_additive_model)
## Analysis of Variance Table
##
## Model 1: KneeScore_Patient ~ (Surgeon + Year + Age + Gender + Weight +
## Height + BMI + Diagnosis + Race + Side + KneeScore_Surgeon +
## GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize +
## FemoralComponentType + TibialTrayModel + TibialTraySize +
## TibialInsertModel + TibialInsertWidth + TibialInsertType +
## PatellaModel + PatellaDiameter) - Diagnosis - Race - TibialInsertType -
## FemoralComponentModel - PatellaModel - TibialTrayModel -
## TibialTraySize - FemoralComponentType
## Model 2: KneeScore_Patient ~ Surgeon + Year + Age + Gender + Weight +
## Height + BMI + Diagnosis + Race + Side + KneeScore_Surgeon +
## GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize +
## FemoralComponentType + TibialTrayModel + TibialTraySize +
## TibialInsertModel + TibialInsertWidth + TibialInsertType +
## PatellaModel + PatellaDiameter
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1902 532327
## 2 1863 518140 39 14187 1.308 0.0973 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
While removing the predictors doesn’t yield a definitively better model at a reasonable \(\alpha\), it does increase adjusted \(R^2\) and slightly improves the cross validation score.
Utilize a Bayesian Information Criterion approach and evaluate the selected model.
n = length(resid(pat_full_additive_model))
pat_model_back_bic = step(pat_full_additive_model, direction = "backward", k = log(n), trace = 0)
evaluate(pat_model_back_bic)
## Test Result
## 1 Shapiro Wilk pvalue 2.427597e-15
## 2 Breusch-Pagan pvalue 7.235823e-07
## 3 R Squared 0.2424413
## 4 Adj R Squared 0.2405008
## 5 Cross Validation 16.93997
## 6 Large Hat Values 112
## 7 Influential 148
anova(pat_model_back_bic, pat_full_additive_model)
## Analysis of Variance Table
##
## Model 1: KneeScore_Patient ~ Age + Gender + Weight + Height + KneeScore_Surgeon
## Model 2: KneeScore_Patient ~ Surgeon + Year + Age + Gender + Weight +
## Height + BMI + Diagnosis + Race + Side + KneeScore_Surgeon +
## GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize +
## FemoralComponentType + TibialTrayModel + TibialTraySize +
## TibialInsertModel + TibialInsertWidth + TibialInsertType +
## PatellaModel + PatellaDiameter
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1952 558236
## 2 1863 518140 89 40097 1.6199 0.0003004 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The resulting BIC model scores marginally better in cross validation though the Anova results clearly prefer the full additive model.
Next, we will evaluate the RMSE of the best performing models thus far.
options(warn=-1)
sample_idx = sample(1:nrow(knee), 1500)
train_data = knee[sample_idx, ]
test_data = knee[-sample_idx, ]
rmse = function(actual, predicted) { sqrt(mean((actual - predicted) ^ 2))}
pat_full_test = rmse(test_data$KneeScore_Patient, predict(pat_full_additive_model, test_data))
pat_full_train = rmse(train_data$KneeScore_Patient, predict(pat_full_additive_model, train_data))
pat_test_mod_8 = rmse(test_data$KneeScore_Patient, predict(pat_model_eight, test_data))
pat_train_mod_8 = rmse(train_data$KneeScore_Patient, predict(pat_model_eight, train_data))
pat_bic_test = rmse(test_data$KneeScore_Patient, predict(pat_model_back_bic, test_data))
pat_bic_train = rmse(train_data$KneeScore_Patient, predict(pat_model_back_bic, train_data))
| Model | Train RMSE | Test RMSE |
|---|---|---|
Model All+ |
13.8075825 | 14.1531654 |
Model 8 |
16.5291748 | 16.3548583 |
Model BIC |
16.9360603 | 16.7169659 |
The additive model with all predictors has the best RMSE values.
Perhaps removing some influential data points will assist in creating a better model. Next we remove influential points that satisfy the criteria for Large Leverage, Large Cook's Distance and Large RStandard values.
new_dataframe = removeInfluential(pat_model_eight, knee)
pat_model_nine = lm(KneeScore_Patient ~ . - Diagnosis - Race - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - TibialTraySize - FemoralComponentType, data = new_dataframe)
evaluate(pat_model_nine)
## Test Result
## 1 Shapiro Wilk pvalue 1.475761e-09
## 2 Breusch-Pagan pvalue 0.001662082
## 3 R Squared 0.3104628
## 4 Adj R Squared 0.2893819
## 5 Cross Validation 15.11494
## 6 Large Hat Values 97
## 7 Influential NA
This does seem to be of moderate benefit in relation to adjusted \(R^2\) and cross validation.
We will evaluate the same model once again after removing the identified large hat values.
remove_hat_dataframe = removeLargeHatValues(pat_model_nine, new_dataframe)
pat_model_ten = lm(KneeScore_Patient ~ . - Diagnosis - Race - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - TibialTraySize - FemoralComponentType, data = remove_hat_dataframe)
evaluate(pat_model_ten)
## Test Result
## 1 Shapiro Wilk pvalue 3.905128e-08
## 2 Breusch-Pagan pvalue 0.002867139
## 3 R Squared 0.3047602
## 4 Adj R Squared 0.2893913
## 5 Cross Validation 15.40933
## 6 Large Hat Values 96
## 7 Influential 94
Removing the large hat values decreased the cross validation score. Perhaps we can find a better model by running BIC on the modified dataframe.
n = length(resid(pat_model_ten))
pat_model_10_bic = step(pat_model_ten, direction = "backward", k = log(n), trace = 0)
evaluate(pat_model_10_bic)
## Test Result
## 1 Shapiro Wilk pvalue 9.345871e-09
## 2 Breusch-Pagan pvalue 0.001236141
## 3 R Squared 0.2778953
## 4 Adj R Squared 0.2758345
## 5 Cross Validation 15.44057
## 6 Large Hat Values 92
## 7 Influential 124
Despite the reduction in leveraged values the selected model is no better than preceding models.
sample_idx_two = sample(1:nrow(remove_hat_dataframe), 1300)
train_data_two = remove_hat_dataframe[sample_idx_two, ]
test_data_two = remove_hat_dataframe[-sample_idx_two, ]
test_mod_10 = rmse(test_data_two$KneeScore_Patient, predict(pat_model_ten, train_data_two))
train_mod_10 = rmse(train_data_two$KneeScore_Patient, predict(pat_model_ten, train_data_two))
test_10_bic = rmse(test_data_two$KneeScore_Patient, predict(pat_model_10_bic, train_data_two))
train_10_bic = rmse(train_data_two$KneeScore_Patient, predict(pat_model_10_bic, train_data_two))
| Model | Train RMSE | Test RMSE |
|---|---|---|
Model 10 |
15.103835 | 19.9564655 |
Model 10 BIC |
15.4178179 | 19.8407548 |
Interestingly, again the models fit on the dataset without leverages fares worse on test RMSE.
# Warning: takes hours to fit
all_two_way_model = lm(KneeScore_Surgeon ~ .*., data = knee)
# Warning...
n = length(resid(all_two_way_model))
model_one_bic_all_two_way = step(all_two_way_model, direction = 'backward', k = log(n), trace = 0)
model_one_pat = lm(log(KneeScore_Patient) ~ ., data = knee)
model_one_surg = lm(log(KneeScore_Surgoen) ~ ., data = knee)
model_two_exp_pat= lm(KneeScore_Patient^2 ~ ., data = knee)
model_two_exp_surg = lm(KneeScore_Surgeon^2 ~ ., data = knee)
# replace all 0 values with the mean of the vector
temp = knee
temp$KneeScore_Patient[temp$KneeScore_Patient == 0] = mean(temp$KneeScore_Patient)
model_three = lm(KneeScore_Patient ~ ., data = temp)
evaluate(model_five)
clean_temp_data = removeLargeHatValues(model_three, temp)
model_four = lm(KneeScore_Patient ~ ., data = clean_temp_data)
evaluate(model_four)
pat_model = lm(KneeScore_Patient ~. , data = knee)
pat_model_back_aic = step(pat_model, direction = "backward", trace = 0)
pat_model_for_aic = step(pat_model, direction = "forward", trace = 0)
pat_model_both_aic = step(pat_model, direction = "both", trace = 0)
n = length(resid(pat_model))
pat_model_back_bic = step(pat_model, direction = "backward", k = log(n), trace = 0)
pat_model_for_bic = step(pat_model, direction = "forward", k = log(n), trace = 0)
pat_model_both_bic = step(pat_model, direction = "both", k = log(n), trace = 0)
evaluate(pat_model)
evaluate(pat_model_back_aic)
evaluate(pat_model_for_aic)
evaluate(pat_model_both_aic)
## Adj.R.Squared: 0.24 - CrossValidation: 16.9
evaluate(pat_model_back_bic)
evaluate(pat_model_for_bic)
evaluate(pat_model_both_bic)
## Remove large leverages
new_data = removeInfluential(pat_model, knee)
pat_model_without_large_leverages = lm(KneeScore_Patient ~ ., data = new_data)
evaluate(pat_model_without_large_leverages)
Removing the records with large hat values and refitting simply introduces more large hat values. What is unique about these rows of data?
which(hatvalues(pat_model) == 1.0)
print(knee[17,])
print(knee[240,])
print(knee[638,])
print(knee[1093,])
print(knee[1775, ])
pat_model_nine = lm(KneeScore_Patient ~ . - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - FemoralComponentType, data = knee)
evaluate(pat_model_nine)